er 2 will focus on one responsive gene discovery problem. In
days, the essential gene discovery is mainly based on the high-
ut transposon sequencing technology. Based on this technology,
of mutants can be generated in one experiment. Genes essentiality
ed based on estimating a density function for a transposon statistic
the transposon insertions per gene statistic or the transposon
sites per gene statistic. This is a typical unsupervised learning
Therefore, various density estimation algorithms and cluster
algorithms will be introduced in this chapter. How they are used
sential gene discovery will be demonstrated.
er 3 will focus on the peptide pattern discovery problem. This
is a type of applications where protein functional sites are
ed using local protein structures, i.e., protein peptides. The
projects of this problem mainly include the discovery of protease
sites or post-translational modification sites in peptides. The
ea also applies to the discovery of the DNA binding sites,
ion factor sites, etc. Protein functional site discovery employs
of peptides, i.e., those having no functional sites and those having
l sites verified in laboratories. The latter refers to the protease
peptides or the posttranslational modified peptides. This type of
ons fits the mainstream in machine learning, i.e., classification
or discriminant analysis. This chapter will introduce various
tion analysis algorithms and demonstrate how these algorithms
ed for peptide pattern discovery. A typical problem of the protein
l site discovery is the data type. A protein sequence is a string of
cids, which are non-numerical. Therefore this chapter will
different encoding approaches for handling amino acids in
so that a machine learning algorithm, which needs numerical data
put, can be used.
er 4 will focus on the genetic-epigenetic interplay pattern
y problem. The genetic signatures stand for genes and the
c signatures stand for methylation sites, DNA copy number, etc.
n objectives of this kind of research include two types of the